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Key word grammars are defined to be the sane as content free 
grarrmars* except that a production may specify a string of arbitrary 
syiraols. These graauitre define languages similar to those used In 
the programs CARF.S-* and ELISA . We shew a mettled of inp lamenting 
the LR(k] parsing algorlthn for context free gramars which can Le 
modified s light ly In order to parse key word gramnars. Wien this Is 
dorse the al^ nrf r.hm can uae :riany‘ of the techniques used in tlie ELISA 
parse. Therefore* the algorltlm helps to show the relation between 
the classical parsers and key word parsers. 
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l* The Scheme 

We indicate the basic idea of the LR(k) parsing scheme by giving an 

example. A formal description Slid disco#aion of the method -can be found in 
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Knuth . Consider the following content free grammar: 

1 S-St 

2 S -» E + E 

3 E - F * F 

4 £ t 

5 V - x 

6 F - jr 

Fig, 1 

The string Jt * y + jj# lies in the language generated by this grammar. 
He can parse this string with the LK{k) algorithm. This algorithm makes 
one p if bb through the string from left to right. The parameter fc refers 
to the number of characters which the algorithm “looks ahead" at each 
step. We will take k ■>* 1* The complete parse of the string is shown In 


Pig. 2. 
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Parsing, the string h * y + x# 

This figure shoitfa the successive s tagea of the push-down-stSck need in 
the parse- Each rectangle is named by the symbol S.^ at its top Left; tins 
succesaiue stages cf the aeach are: 

S Q 

S 0 XS I 

S 0 FS S 

* 0^2 * E 3 





































-3- 


W V s * 

S 0*V S 3^ 

V®fi + S ? X S & 

s 0 E & 6 + & 7 P & 9 

V*s* V®io 


So SS 


11 


V s 11* S 12 

Let us see lien? the information in the successive ractang Lea , and 

corresponding stages of the stack are generated. Each S j is a set of states 

hor i a write 1 

Of the form [right side of a production* terminal dur«c»i) with aflber placed 
just before One character on the right side of the production- The terminal 
character is the character which most be the ne«t input character if a r<S" 
duction of the stack corresponding to the production whose right side is 
given in the state is to be made. To form ft . T we ask what productions 
could possibly lead to the first character of any inpot string, Since all 
derivations of an acceptable input string must start with production 1, 

$ -+ g# we start S Q with the State [ 30* e^ indicating that we are looking for 
the string 3# followed by no input character and indicating with the placement 
Of the - before the 3 that we have not yet found any of the uharactera of this 
string. How from the grammar we see that in order to find the first character 
of this atrLng , S t which L» to be followed by a t we must find the string 
E + E followed by a # so we add the state [ E + E. #3 CO 3^. Similarly, to find 
an E followed by a + we must find either F * F (corresponding to production 3) 
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followed by a + or tho string, F (corresponding to produce ion 4) followed by 
a ’K This process of adding to 3^ a 1L the states -which we should be Looking 
for as a consequence of the statea already in is called 1 'c Cwipu ting the 
closure of S^* 1 , The complete closure of la shown In Fig. 2. 

How Chat we have we place it on the stack* We examine each state 
in 3^ to see if we have found all of Che specified characters iu any of then. 
Wc have h&t, so we add the (trst input character t x, to the a tack. We Chen 
compute S by placing in every state in 5^ which has the Immediately to 
the Left of the c ha rac t a x jwh ich was Just placed on the stack. We place the 
over the x to indicate chat the k has been 11 found 1 '* S | thus has two states 
tit, *] end [x, + ], Next we compute the closure of 5^ It is already Closed, 
We then place S. on the stack. How we proceed a a we did when we plated S |; cm 
the a tack. We look to aee if any states in have all of the characters 
found, and both b£ them do, Sine* the next input character is * w« ignore 
the second stats, fit, +1* and make the reduction, X "* F* corresponding to 
the first, x li said to be the current 1T bandle". To make this reduction 
we remove * end s 1 from the stack and replace them with F- Then we form s ; 
as the cloanre of those states in S L which have an F preceded by . tfa 
continue as above until the parse la completed with the generation of Sjj, . 

Mote that there are only e finite number of possible states and so 
there are only a finite number of distinct S^. It is possible to compute 
once and for all each which will occur in parsing any string which ia 
generated by a given grammar which can be parsed by this algorithm. Thus 
one could set up an array which would give the action which the parser 
should take for each corahinAtion of Art and with an input symbol. 
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The parse is then reduced to table look up and the mechanism La very similar 
to & precedence. * Igotithns puree* However, if there are as many as 200 pro¬ 
ductions. This array ecu Id be very large (even if s imp Li f ic a t tors to remove 
redundant cases are made). 

2 h An IiUp Vemfen ta t ion of the LEt(l) Pursing Scheme 

We now introduce an alternative approach* The array approach sonuiarisBE 
each state S. in a single number* However, if the next state is very "almiLar” 
to the last State then an acceptable alternative is to try to represent the 
state by many numbers, only a few of which will charge with each change of 
state* The push-down- lifl t is than used to* save only those numbers which change 
In implementing this approach we assume that the depth of the stack can always 
bfc Specified as an entry of the array which defines the State* As we will see* 
the state can then be defined by s Llotlng entries III this array for each 
combination of handle and character which can follow it in some parse, A 
number of entries equal to the number of characters ih the handle would ba 
aliotad for each such combination. However, in an attempt to simplify the 
procedure without destroying its usefulness we will not keep this much informa¬ 
tion * We will Ortly keep a list of the characters which cannot follow the 
tight Aide of a given production la sny sentential form, then each fifth t a 14c 
need only appear once in the stray defining the state, instead of once for 
each character which can follow it in some sentential form. Then we can 
specify the array defining the state as containing one symbol for each symbol, 
in each production of the grammar to be parsed. For the grammar above the 
array will have L7 entries. 
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3 4 5 6 7 8 S 10 11 12 13 H4 15 16 LJ 

(S> E + E (S) F * F (E> F (E) s <F) y (F) 

Fig. 3 

Sentries I and 2 correspond to symbols 1 and 2 on tine right s ide of production 
1. Entry 3 corresponds to Che symbol on the left side of production 1. The 
remaining entries correspond to the remaining productions in the seme way. 

The entries corresponding to the left aide of a production are titled with a 
pointer to £ function which reduces th* etac-fc if that production J « found as 
a "handle" {in the sense explained In section l) not followed by any o£ the 
specified characters. For exaiEip le, an try 13 above says that the reduction 
F E should only be made tf the next input character is not *. We call the 
above array the state array* We also set op £ second array whose function is 
to describe the entries in the state array. This second array is called the 
property array and its entries are in one-to-one correspondence with the 
entries of the state array. If an entry in the a fate array corresponds to a 
terminal, the corresponding entry in the property array is aero. If an entry 
in the state array corresponds to a non-terminal on the right side of a pro¬ 
duction t the entry in the property array is a negative number whose absolute 
value is a pointer to a list of every entry in the state array which corres¬ 
ponds to the first character on tb* right side of a production whose Left side 

is this non-terminal* (This list is used to form closures efficiently.) If 

a 

an entry in, the state array corresponds tcy(fion-terminal which is the left side 
of a production, the corresponding entry in the property array is a pointer 
to a List of those entries In the state array which correspond to an appear¬ 
ance of this non-CO Patna 1 On the right side of sobtb production. These entries 


array entry 1 2 

corresponding 5 £ 

sysnbo 1 
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in the- property array and the entries in the *\Ate ertay corresponding to the 
left aide of a production are trade once and for all and do not change during 
a parse. The current state of the parse Is kept in the entries of the state 
array corresponding to the symbols on the right aides of productions and on 
the push-down-atank, 

Initially the push-down-Stack id empty and these state array entries are 
all zero, Wo then chtPge Che State array to represent the Slat* 5^ a* fcUdwi, 
Referring to Fig. Z we see chat the stack has depth 1 (u - 1) after 5^ i* 

placed on it* During the parse the stack grows deeper and is then reduced, 

but whenever S Q i* on the top of the stack, the stack has depth 1. Therefore, 

Sq can bo interpreted as specifying that we are looking for the character 5 
in producttoft 1, £ in production 2 * F in productions 3 and 4, K Is production 
5 and y in production &, when the stack is of depth 1. We thus set the state 
array to indicate tht* by placing a 1 in the entries corresponding to these 

characters. ■ The array then has the form Aq show in line 3 of Fig. 4. 

The Stack is empty. 

We now describe the procedure for going from state A,, to state A,, which 
corresponds to the procedure for going from S Q to S 1 in Fig* 2 ■ Each input 
character bed associated with it a list of the entries in the state array 
which correspond to that character. The current input character here, x, thus 
has entry 14 associated with it. Ue then go to entry 14 and see if it con¬ 
tains a 1, indicating that k is wanted at the current push-down-atack level, 
which is 1. It is and so we advance to the next entry, 15. Checking the 
corresponding entry in the property array we sue that we should make the 
reduction x "• F+ Since the band la is only one Chiractee long the push-down-atack 
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depth r(jta?i in* 1- The property list entry tolls us that entries 8, 10 k and, 12 

60t PC?non 1 to T?4 Cheeking entry 12 we see that it is 1. so we advance to entry 

11 - Here wo see that a reduction should be made if the next input character is 

not -•, be it is bo wp ahiindon this and check entry 10. Entry 10 is 0 

this 

indicating that thaty^F is not n«edod+ Finally vti check entry 8, Entry 8 is 
I eo we advance to entry 9. We 9CC from the property list that entry S cor¬ 
responds to '* terminal,** so we place a 2 in entry 9 indie* tin ft that we 
need * * at depth £- we save the old v&lue of entry 9 on the push-down-stfick, 
t£) he restore 1 when the stack is reduced bdtk to depth 1- Tit is brings u8 to 

state A in fin. A* which corresponds to 15 in Fig. 2. The next input cliarac ter 
’■ 2 

is a * which sends us to entry 9, entry 9 contains a 2 and so we advance to 
entry i'-h The: eorre*ponding entry in tho property array tells us thet entry 
it corresponds to a non -1 ermine 1 t so we place a 3 in entry 10, saving the 
previous contents on the push down list, we them obtain the list of entries 
pended to compute the closure. These are 14 and 16; we go to 14 and 18 end 
place a 3 in them „ again gavtnft the old contents. Since 14 and 16 correspond 
to terminals the closure is coop late. This brings us to state A^ in Fig. 4. 
Proceeding in the way we parse the input string as shown in Fig. 4 + 

3 . Key Word ■framtars 

We define a key word grammar to he Che same se a context fret grammar 

except that the set of terminal characters is left unspecified and productions 

tray contain arbitrary strings of the unspecified terminal characters. Since 

we cdm'fc list all o£ the terminal characters wo let a stand for any string of 

zero or more terminal characters. A key word, graratar la a set of productions 

of the form A -* H, . * + X where each H is either an Intermediate k a 
pin i 

P 



teraSjftaL* or the symbol 3, and A^ is *n to te rmedlate- The strings generated 
by the grammar are thus patterns coAtsirtlng the Symbol A string lies in 
the language generated by the ^treauar if it pan be made to match one of the 
patterns generated by the grammar. 

A# TatSiOJt jtey Word Granmiars 

■Obviously* key word graimarB are as general sa con tout free g tannoa r 9 and 
so there will be keyword grammare which cannot be parsed with an algorithm 
less powerful (in some sense) than A nCn-dcterminiStic push-down automaton. 

AC the Other end of Ch* scale a Since the ecrings, J t may contain any terminaL 
characters the precedence relation * holds between every pa It of strings of 
termisrial characters and bo a precedence algorithm r.»y not be sufficient to 
parse keyword grammars. 

We give here a variation Of the Algorithm which seems to have 

enough power to parse interesting keyword grammars# If the Algorithm la Coe 
Blow in practice one might investigate a procsidenco algorithm. If it is not 
powerful enough we CWld ejtpand it to LR(k). 

i*e algorithm will not parse all keyword grar«iars. We therefore should 
define A test which a grammar must pass which will guarantee that the grammar 
can be parsed by the algorithm. One of the restrictions we make id that 
no production can have two adjacent fr T S or an Ac AS the rightmost character of 
the right side of a rule. 

5. A Kay Word Grammar Fa r*& Igor ithm 

Our algorithm 10 a modification of the procedure given in Section 2. 
There, we looked for the characters of a production one by one. New consider 
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the production A -* a a b, The Or means that after we find the rf a" ve should 
begin looking for a " p b" p and there can be any number of intervening, -characters. 
Thus, in the notation in Section 2 , if we Start looking for "V at depth 2 
we want to find "b" at depth 2 or any greater depth. We will indicate this 
by plating a - 2 instead of a 2 in the state array entry. Mow consider the 
gradtafttr 

1. A - a a E 

2. A - a b » 

3. B ”* tf 

This grammar is ambiguous because the string a b c has the two parsings; 



Yet wa do not want to throw this grander out. We can uSC it to express the useful 
idea that any String Stntting with "a ff and ending with M b M should he mite tied 
with production 1 unless it has the specific form abc, In which case produc¬ 
tion 2 should be used. Let us see what this imp11as for the modification of 
the procedure in Section 2. If we try to parse the string "abc ,< wo will find 
the ri a n at depth 1, SO wO begin looking for a "e M at any dftpth W 2 and a Pp b ,p 
at depth 2. We find a "b 1 * at depth 2 and so we look for the "c" at depth 3. 

But we are already looking for the pl c lh at any depth > 2. Therefore, we let 
the specific depth 3 dominate the 2 " specification. We place the tatter aw 
a temporary list to be restored if we go on to depth 4, since nt don p t want 
to reject s string like abde. Finally, in a conflict bfttwCCn '*£ 2” and 
rh a we would allow r1 ^ 4 ,p to dominate, pushing 't 2" ocitb the main etnek. 
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TtiIs- ts the Idea of our modification to Section 2+ In order to 
implement it we indicate in the property array whether or not eaoti character 
On th* tight Side of stnue production is preceded by Ot y but we -do not put any 
entries in the state array for the Q 1 a. Then, during parsing,, «e enter a 
negative number instead of a positive number in the atat* *rrey if the entry 
corresponds to a character preceded by Q or if vc arc conrpu t ing the c Loan re 
of such sn entry. Mote that on making reductions handies corresponding to 
the same production can be of different lengths dependinu on the length of 
the strings matched by the tt'i. Therefore wo nrust US* the entries in the 
state array to find the left end of a handle. 

6. An Example 

Consider the grammar: 

1 $ - fi# 

2 $ - 0£ +■ E 

3 E - F*F 

4 S ■* F 

5 F ■"* st 

6 F — y 

Given the string .KM + Kt the algorithm ftndla the parsing 



E + E 
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Mflior steps in the para? are in Fig, 5. N^te that if the reduction of the 

initial j to E prevented i* correct reduction the string would be rejected*. 

There are nr? grammara for which the alfiorittun will accept incorrect strings, 
sut there are soira for which it will PC.1 act correct ones. 


1 2 3 4 5 0 7 S 9 10 U 12 13 14 15 lb 17 

Li (*) 

S * (S) B + E <S) F * F <E) F (fi) * ( ’} Y O') IevG 1 
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Fig. 5 


Parsing the scrips XX + X with the new algorithm. 


7. Conclusion 

Exploring, this ides further would Make an interesting project far smn«ne 
interested in parsing. For exampis, we could use all the Ideas Ws i*enbaum 
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such as precedence among productions when 

reduction and the other doesn't. Also * we can 

4 

by irwdifying Early's procedure olw^ i imi L*e 


one ca Lla for a stack 
piirs k any key word granmar 
Unas. 
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